Clustering Multivariate Normal Distributions

نویسندگان

  • Frank Nielsen
  • Richard Nock
چکیده

In this paper, we consider the task of clustering multivariate normal distributions with respect to the relative entropy into a prescribed number, k, of clusters using a generalization of Lloyd’s k-means algorithm [1]. We revisit this information-theoretic clustering problem under the auspices of mixed-type Bregman divergences, and show that the approach of Davis and Dhillon [2] (NIPS*06) can also be derived directly, by applying the Bregman k-means algorithm, once the proper vector/matrix Legendre transformations are defined. We further explain the dualistic structure of the sided k-means clustering, and present a novel k-means algorithm for clustering with respect to the symmetrical relative entropy, the J-divergence. Our approach extends to differential entropic clustering of arbitrary members of the same exponential families in statistics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Fuzzy Classification Maximum Likelihood Clustering with Multivariate t-Distributions

Mixtures of distributions have been used as probability models for clustering data. Classification maximum likelihood (CML) procedure is a popular mixture of maximum likelihood approach to clustering. Yang (1993) extended CML to fuzzy CML (FCML) for a normal mixture model, called FCML-N. However, normal distributions are not robust for outliers. In general, t-distributions should be more robust...

متن کامل

A Comparison of Information Criteria in Clustering Based on Mixture of Multivariate Normal Distributions

Clustering analysis based on a mixture of multivariate normal distributions is commonly used in the clustering of multidimensional data sets. Model selection is one of the most important problems in mixture cluster analysis based on the mixture of multivariate normal distributions. Model selection involves the determination of the number of components (clusters) and the selection of an appropri...

متن کامل

Comparing Mean Vectors Via Generalized Inference in Multivariate Log-Normal Distributions

Abstract In this paper, we consider the problem of means in several multivariate log-normal distributions and propose a useful method called as generalized variable method. Simulation studies show that suggested method has a appropriate size and power regardless sample size. To evaluation this method, we compare this method with traditional MANOVA such that the actual sizes of the two methods ...

متن کامل

Pattern Clustering by Multivariate Mixture Analysis.

Cluster analysis is reformulated as a problem of estimating the para- meters of a mixture of multivariate distributions. The maximum-likelihood theory and numerical solution techniques are developed for a fairly general class of distributions. The theory is applied to mixtures of multivariate nor- mals (NORMIX) and mixtures of multivariate Bernoulli distributions (Latent Classes). The feasibili...

متن کامل

Rejoinder to the discussion of "Model-based clustering and classification with non-normal mixture distributions"

Non-normal mixture distributions have received increasing attention in recent years. Finite mixtures of multivariate skew-symmetric distributions, in particular, the skew normal and skew t-mixture models, are emerging as promising extensions to the traditional normal and t-mixture models. Most of these parametric families of skew distributions are closely related, and can be classified into fou...

متن کامل

Optimal clustering of multivariate normal distributions using divergence and its application to HMM adaptation

We present an optimal clustering algorithm for grouping multivariate normal distributions into clusters using the divergence, a symmetric, information-theoretic distortion measure based on the Kullback-Liebler distance. Optimal solutions for normal distributions are shown to he obtained by solving a set of Riccati matrix equations and the optimal centroids are found by altemating the mean and c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008